A Semantic Method for Searching Knowledge in a Software Development Context
نویسندگان
چکیده
The FACIT-SME European FP-7 project targets to facilitate the use and sharing of Software Engineering (SE) methods and best practices among software developing SMEs. In this context, we present an automatic semantic document searching method based on Word Sense Disambiguation which exploits both syntactic and semantic information provided by external dictionaries and is easily applicable for any SME. 1 Introduction and Motivation Over the last years, in Europe, software development is becoming a bottleneck in the development of the Information Society, especially for SMEs (Small and Medium Enterprises) which need to allocate mostly all of their available resources on its production rather than on new technology training. The main goal of the European FP7 3 years project “Facilitate IT-providing SMEs by Operationrelated Models and Methods (FACIT-SME)” is to facilitate IT SMEs in sharing and (re)using SE methods, tools, and experiences for systematically designing and developing their applications integrated with the business processes. In order to achieve this goal, the project proposes a novel Open Reference Model (ORM) [4] serving as an underlying knowledge backbone which stores existing reference knowledge for software-developing SMEs, including different engineering methods, tools, quality model requirements, and enterprise model fragments of IT SMEs in a computer-processable form. On top of the ORM repository, a customizable Open Source Enactment System (OSES) [3] provides IT support for the project-specific application of the ORM. As key part of the OSES, specific query-based search methods support the organizations in: finding a new methodology, by selecting ORM elements that best match given specific enterprise objectives (i.e., “From Scratch” scenario); improving a given existing methodology, suggesting the ORM information most relevant to it (i.e., “From methodology” scenario). ? This extended abstract summarizes the research work we performed in the first two years of the FACIT-SME project, including a summarization of the preliminary results described in [12] (SEKE 2011). It was partially supported by the European Community’s Seventh Framework Programme managed by REA Research Executive Agency (http://ec.europa.eu/research/rea)([FP7/2007-2013][FP7/2007 2011]) ? This extended abstract summa izes the research work we performed in the first two years of the FACIT-SM project, including a summarization of the preliminary results described in [12] (SEKE 2011). It was parti lly supported by the European Community’s Sev nth Fram work P ogramme managed by REA Research Executive Agency (http://ec.europa.eu/research/rea)([FP7/2007-2013][FP7/2007 2011]) Nicola Ferro and Letizia Tanca (Eds.): SEBD 2012, Edizioni Libreria Progetto, Padova, Italia ISBN: 978-88-96477-23-6, Copyright (c) 2012 Edizioni Libreria Progetto and the authors 116 S. Bergamaschi, R. Martoglia, and S. Sorrentino In our research work, we focus on search/filtering methods by taking advantage of the textual information (which we will refer to as documents) stored in the ORM and/or already available in each enterprise. In this context, queries are provided in textual form, e.g. keywords/sentences about the company background and project requirements, or even existing methodology descriptions (for the second scenario). Standard search methods based on syntactic techniques [5] are often inadequate to capture the similarity between documents, as they do not consider the semantics associated with the terms composing documents. For instance, without exploiting semantics, i.e., synonyms and related terms, the piece of document D1 “...clients for your small business enterprise...” would wrongly be deemed as irrelevant to the query fragment Q1 “....product requirements specified by the customer...”. Moreover, terms might be ambiguous, i.e., they may have more than one possible meaning. For instance, even if the piece of document D2 “Distributed applications partition workloads between servers and clients...” contains “client”, the term is used in a completely different context, thus it should not be presented among the results. In this paper, we propose a semantic method, implemented in the Semantic Helper component of the FACIT-SME solution, for searching ORM documents. It exploits a standard information retrieval weighting/ranking scheme extended to take into account synonyms and related terms information, together with Word Sense Disambiguation (WSD) techniques, and leads to the following achievements: (1) it is a fully automatic and semantic method that overcomes the standard syntactic technique limitations; (2) it is devised for IT SMEs, providing them with a flexible and easy-to-apply method that does not require big investments or knowledge prerequisites. The rest of the paper is organized as follows: in Sections 2, we describe the Semantic Helper and the phases of the processes it supports; in Section 3, we describe the experimental evaluation of our method, while Section 4 concludes the work and briefly analyzes related works. 2 The Semantic Helper The Semantic Helper supports the FACIT-SME solution by performing two main processes (see Figure 1): 1. Semantic Glossary Population: during this off-line process, statistical and semantic information are automatically extracted from the ORM documents and stored in a repository called Semantic Glossary; 2. Relevant Document Ranking/Retrieval: in this online process, user queries are processed and relevant documents are identified by exploiting the information provided by the Semantic Glossary. In these two processes, we can identify three main phases: (a) keyword extraction and enrichment; (b) Semantic Glossary generation; (c) semantic similarity computation. Keyword Extraction and Enrichment. The goal of this phase (involved in both processes) is to automatically extract, normalize and disambiguate terms Nicola Ferro and Letizia Tanca (Eds.): SEBD 2012, Edizioni Libreria Progetto, Padova, Italia ISBN: 978-88-96477-23-6, Copyright (c) 2012 Edizioni Libreria Progetto and the authors A Semantic Method for Searching Knowledge in Software Development 117 Keyword Extrac.on and Enrichment ORM Repository Sema0c Glossary WordNet IEEE Vocabulary New Document Semantic Glossary Population (offline process) Keyword Extrac.on and Enrichment Sema0c Glossary Relevant Document Ranking/Retrieval (online process) Query Seman.c Similarity Computa.on ORM Document Ranking Knowledge Sources WordNet IEEE Vocabulary
منابع مشابه
The Symbiosis of Human and Semantic Technology Through the Lens of Actor-Network Theory
Background: Semantic technologies (STs) have made machine reasoning possible by providing intelligent data management methods. This capability has created new forms of interaction between humans and STs, which is called "semantic interaction." The increasing spread of this form of interaction in daily life reveals the need to identify the factors affecting it and introduce the requirements of...
متن کاملDevelopment of a Combined System Based on Data Mining and Semantic Web for the Diagnosis of Autism
Introduction: Autism is a nervous system disorder, and since there is no direct diagnosis for it, data mining can help diagnose the disease. Ontology as a backbone of the semantic web, a knowledge database with shareability and reusability, can be a confirmation of the correctness of disease diagnosis systems. This study aimed to provide a system for diagnosing autistic children with a combinat...
متن کاملDevelopment of a Combined System Based on Data Mining and Semantic Web for the Diagnosis of Autism
Introduction: Autism is a nervous system disorder, and since there is no direct diagnosis for it, data mining can help diagnose the disease. Ontology as a backbone of the semantic web, a knowledge database with shareability and reusability, can be a confirmation of the correctness of disease diagnosis systems. This study aimed to provide a system for diagnosing autistic children with a combinat...
متن کاملBridging the semantic gap for software effort estimation by hierarchical feature selection techniques
Software project management is one of the significant activates in the software development process. Software Development Effort Estimation (SDEE) is a challenging task in the software project management. SDEE is an old activity in computer industry from 1940s and has been reviewed several times. A SDEE model is appropriate if it provides the accuracy and confidence simultaneously before softwa...
متن کاملExploiting Semantics in Collaborative Software Development Tasks
Modern software development is highly knowledge intensive; it requires that software developers create and share new knowledge during their daily work. However, current software development environments are “syntantic”, i.e. they do not facilitate understanding the semantics of software artifacts and hence cannot fully support the knowledge-driven activities of developers. In this paper we pres...
متن کامل